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Abstract 

We describe an efficient algorithm to compute forces in quantum Monte Carlo using adjoint 
algorithmic differentiation. This allows us to apply the space warp coordinate transformation 
in differential form, and compute all the 3M force components of a system with M atoms with a 
computational effort comparable with the one to obtain the total energy. Few examples illustrating 
the method for an electronic system containing several water molecules are presented. With the 
present technique, the calculation of finite-temperature thermodynamic properties of materials 
with quantum Monte Carlo will be feasible in the near future. 
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I. INTRODUCTION 



In the last few years, we have seen a remarkable progress in the ab-initio simulation 
of realistic electronic systems based on first principles quantum mechanics. Despite the 
power of density functional theory (DFT), with standard local density approximation (LDA) 
and generalized gradient approximation (GGA) functionals, much effort has been devoted 
to schemes that are able to describe more accurately the electronic correlations. This is 
because several materials - such as high temperature superconductors - are indeed strongly 
correlated. Furthermore, long-range dispersive forces may be extremely important even 
in simple and fundamental materials, like water, and are notoriously difficult to describe 
with standard DFT. A promising many-body approach, alternative to DFT, is the so called 
quantum Monte Carlo (QMC) method, allowing one to include the electronic correlations 
by means of a highly-accurate many-body wave function (WF), sampled by a statistical 
method. All the basic ingredients of the electronic correlation are described explicitly within 
this framework. This is particularly appealing because its computational cost scales rather 
well with the number of electrons A^, with a modest power, e.g., A^^ or A^^. Due to this 
important property, QMC is very promising for large scale calculations especially when 
compared with standard post-Hartree-Fock methods. In fact, these methods are also capable 
to describe rather well the electronic correlations. However, they typically require a larger 
computational cost: from polynomial in the range between A^^ and A^'', to exponential 
complexity in full configuration interaction (FCI) schemes. 

Despite the clear advantage of QMC for the accurate electronic simulation of materials 
containing a large number of atoms, its application has been mainly limited to total energy 
calculations. In particular, no general method to calculate ionic forces, that remains efficient 
even for a large number of atoms M, is available so far. Indeed, many-body forces usually 
lead to cumbersome expressions, whose evaluation can be done at present only by means of 
complicated and computationally expensive algorithms. For this reason, such calculations 
have been implemented so far only for particularly simple cases. For instance, it was recently 
possible [li to simulate ~ 100 hydrogen atoms in the low temperature, high pressure phase, 
with the calculation of the ionic forces based on efficient strategies to work with finite 

. Unfortunately, this route cannot be followed in general because 
it works very efficiently only for light atoms. 



2 



On the other hand, an accurate algorithm for the calculation of forces that is in principle 
efficient also with heavy atoms, and with the use of pseudopotentials, was introduced a long 
time ago j4|. This method is based on the so called the space warp coordinate transformation 
(SWCT), allowing a zero- variance expression - i.e., maximum efficiency in QMC - even for 
isolated atoms. We believe that, without a highly effective variance reduction scheme in the 
calculation of forces, such as SWCT, there is no hope to extend the applicability of QMC to 
structural optimization of complex and correlated materials, or to perform ab-initio molec- 
ular dynamics simulation at finite temperature. However, even when using this promising 
approach, or others based on the zero- variance principle |3|, standard implementations, e.g., 
based on finite-difference approximations of the derivatives appearing in the expressions for 
the ionic forces, are still computationally very expensive - to the point of being unfeasible 
for a large number of atoms. 

In this work, we propose a simple strategy for the efficient calculation of the ionic forces 
- and generally of any arbitrarily complicated derivative of the QMC total energy - by using 
adjoint algorithmic differentiation. We will show that this method will allow us to achieve 
two very important targets: 

1. the numerical implementation of all the energy derivatives will be possible in a straight- 
forward way without any reference to complicated expressions, as for instance the 
terms shown in Ref. l5|, when pseudopotential are used to remove the effect of core 
electrons; 

2. more importantly, the calculation of an arbitrarily large number of energy derivatives 
will be possible with a computational effort comparable with the one to compute the 
energy alone. In other words, once the calculation of the energy is well optimized, by 
means of AAD, also the one for the energy derivatives will be almost optimal. 

The latter property is rather remarkable, and suggests that most of the advanced ab-initio 
tools belonging to a rather restrictive approximation of the DFT functional, such as struc- 
tural optimization and molecular dynamics at finite temperature, can be extended to highly 
accurate many-body approaches based on QMC, with a computational effort that remains 
affordable even for a large number of atoms. 

Though in most of the examples presented here we have used the standard variational 
QMC method, the technique we propose in this paper directly applies also to more accurate 



QMC projection schemes, such as (lattice regularized) diffusion Monte Carlo. However one 
has to take into account that in this case there are unsolved technical problems - e.g., infinite 
variance in the most accurate estimators - that do not depend on the technique we propose, 
and are outside the main scope of this paper. In order to avoid confusion, we anticipate 
that we apply the AAD technique only to the specific calculation of the wave function and 
the local energy (see later for their definitions) required for the VMC or DMC (LRDMC) 
evaluation of the average energy. In principle AAD can be applied to the whole algorithm, 
but we have not exploited this possibility, that may be interesting for future applications. 

II. QMC WAVE FUNCTION, VMC AND LRDMC 

In this Section we begin by describing the WF that we have used in our QMC calculations. 
In the following we will denote with r a generic three dimensional electronic position, whereas 
X = {ri,r2, ■ ■ ■ ,riv} stands for a configuration of all electron positions and spins; the A^-j- 
spin-up electrons are at positions rj with I < i < , and the spin-down electrons are 
at positions Nf + 1 < i < N. 

The usual trial WF used in the QMC calculation is the product of an antisymmetric 
part, and a Jastrow factor. The antisymmetric part is a single Slater determinant, while the 
Jastrow factor is a bosonic many-body function which accounts for the dynamic correlations 
in the system. Our Slater determinant is obtained with N/2 doubly occupied molecular 
orbitals_?/'j(r), expanded over L atomic Gaussian orbitals 0j(r), centered at atomic positions 
Rj, as p: 

L 

M^^) = ^Xij(pj{r) (1) 

where the coefficients Xij? as well as the non-linear coefficients, appearing in the exponents 
of the Gaussians, can be fully optimized by energy minimization as described later. 

The molecular orbitals are initialized from a self consistent DFT-LDA calculation, in the 
same atomic basis. The Jastrow factor takes into account the electronic correlation between 
two electrons, and is conventionally split into an homogeneous interaction J2, depending 
on the relative distance between two electrons, and two non- homogeneous contributions 
J3 and J4, depending on the positions of two electrons and one atom, and two electrons 
and two atoms, respectively. It also contains an inhomogeneous term J2, describing the 
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electron-ion interaction. This is important to compensate for the change in the one particle 
density induced by J2, J'i and J4, as well as to satisfy the electron- ion cusp conditions. The 
homogeneous and inhomogeneous two-body terms J2 and J2 are defined by the following 
equations: 



k = exp [ ^ -{2Z^f/\{zl/'n^) + atxiAri)] , (2) 

ia ial 

and 

J2 = exp[J^n(rij)], (3) 

where i,j are indices running over the electrons, and / runs over different single particle 
orbitals xii centered on the atomic center a; r^a and denote the electron-ion and the 
electron-electron distances, respectively. The corresponding cusp conditions are fixed by the 
function u{r) = F[l — exp(— r/F)]/2 (see e.g., Ref. jsj), whereas gi and F are optimizable 
variational parameters. 

The three and four-body Jastrow terms J3J4 are given by: 



J3(x) J4(x) = exp I ^ f{Ti, Tj) J 
V i<j / 



(4) 

i<j 

with /(r,, Tj), being a two-electron coordinate function that can be expanded into the same 
single-particle basis used for J2: 

/(r„ r,) = J2 9tt xil{r^)xLi^j), (5) 

ablm 

with gf^ optimizable parameters. Three-body (electron-ion-electron) correlations are de- 
scribed by the diagonal matrix elements (7"", whereas four-body correlations (electron-ion- 
electron-ion) are described by the matrix elements with a ^ b. 

The complete expression of the Jastrow factor J{x) = J2{x)J2{x)Jz{x)Ji{x) that we 
adopt in this work allows us to take into account weak and long-range electron-electron 
interactions, and it is also extremely effective for suppressing higher energy configurations 
occurring when electrons are too close. 

In order to minimize the energy expectation value corresponding to this WF we have ap- 



plied the well-established energy minimization schemes (8 



- Il0| , that we have recently adapted 



for an efficient optimization of the molecular orbitals of the Slater determinant in presence 
of the Jastrow factor described above [gI- 

In variational Monte Carlo (VMC), the energy expectation value of a given correlated 
WF, depending on a set of variational parameters Ci,i = l,---p, can be computed with 
a standard statistical method. The energy depends in turn on the atomic positions R^, 
a = 1, ■ ■ ■ M, so that we indicate formally: 



(^{q},{r4 




^{c,},{R4) 


(^{c.},{R4 


^{c,},{Ra}) 



In VMC the above energy expectation value is computed statistically by sampling the prob- 
ability n(a;) oc (a;|\I')^. Analogous and more accurate techniques are also possible within 
QMC. In this work the lattice regularized diffusion Monte Carlo (LRDMC) will be also 
used. The latter method is a projection technique, filtering out the ground state component 
of a given variational WF by applying the propagator e"^"^ to \I' for large imaginary time 
r. This propagation is subject to the restriction to modify only the amplitudes of the WF 
without affecting its phases. In this way one can avoid the so called "fermion sign problem" 
instability in QMC, with an highly accurate technique^ providing a rigorous upper bound of 
the total energy even in presence of pseudopotentials ll| . 

III. SPACE WARP COORDINATE TRANSFORMATION AND ITS DIFFEREN- 
TIAL FORM 

The main purpose of this Section is to describe an efficient method to compute the forces 
F acting on each of the M nuclear positions {Ri, . . . , Ra/ }, namely, 

F(R.) = -VR„i5;({c,},{Rj), (7) 



with a reasonable statistical accuracy. Following Ref. |9|, we introduce a finite-difference 
operator A/AR^ for the evaluation of the force acting on a given nuclear position R^, 

A E{Ka + ARa) - E{Ka) 

^ = ^ (8) 



AR„ AR, 

so that 



FiRa) = -^E + OiAR) (9) 
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where AR^ is a three dimensional vector. Its length AR can be chosen as small as 10~^ 
atomic units, yielding negligible finite-difference errors for the evaluation of the exact energy 
derivative. 

In order to evaluate the energy differences in Eq. it is very convenient to apply the 
space warp coordinate transformation (SWCT). This transformation was introduced a long 
time ago in Ref. \4, for an efficient calculation of the ionic forces within VMC According 
to this transformation, as soon as the ions are displaced, also the electronic coordinates r 
will be translated in order to mimic the displacement of the charge around the nucleus Ra, 
namely x ^ x, with 

fi = ri + ARa ^ai^i), (10) 

Ej£iF(|r-R.|) 

and F(r) is a function which must decay rapidly; here we used F{r) = as suggested in 



= ^irL 1 . (11) 



Ref. 
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The expectation value of the energy depends on AR^, because both the Hamiltonian and 
the WF depend on the nuclear positions. Applying the SWCT to the integral involved in 
the calculation, the expectation value reads 

E Ra + ARJ = ^ AH„v^; AR„v V l — l±Jl 12) 

where J is the Jacobian of the transformation and, 

- (vI^arJx) 

is the so called local energy defined by the wave function ^E^ar^ on a real space electronic 
configuration x. In the following, we define = for AR^ = 0. 

The importance of the SWCT in reducing the statistical error in the evaluation of the 
force is easily understood for the case of an isolated atom a. In this case, the force acting on 
the atom is obviously zero, but only after the SWCT with Ua = 1 the integrand in Eq. f|T2|) 
is independent of AR^, providing an estimator of the force with zero variance. 

Starting from Eq. ( |T2l) . it is straightforward to derive explicitly a differential expression 
for the force estimator, which is related to the gradient of the previous quantity with respect 
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to ARa in the limit of vanishing displacement, 

where the brackets indicate a Monte Carlo like average over the square modulus of the trial 
WF, namely over the probability 11(2;) introduced in the previous Section. In the calculation 
of the total derivatives d/dHa, we have to take into account that the electron coordinates 
are also implicitly differentiated, according to the SWCT. Then, all the terms above can be 
written in a closed expression once the partial derivatives of the local energy and of the WF 
logarithm are known, namely, 

, f) ^ f) 



i=l 



N 



^ iog(jV2vi.) = ^\ogm + Yl 



dHa dHa . 

1=1 



c..(r.)|-logv^ + i|-u;.(r. 



(16) 



where the term |^a;a(rj) in the square brackets gives the contribution of the Jacobian. 

Based on the expressions above we can evaluate the forces in Eq. (JT^ in three different 
ways, listed below in increasing order of efficiency: 

1. Helmann-Feynmann (HFM). By neglecting all the dependence of \1/ on the atomic 
position, and without the SWCT: 

This is the least accurate expression, because without the so called Pulay terms derived 
in Eq. ( fT6i) the force is consistent with the energy derivative only when the WF is 
the exact ground state. This is also the least efficient way to compute the forces as 
indicated in Tab. HI 

2. No-SWCT. This is obtained by using the terms fll5lll6p in Eq. f lT4|) with Ua{r) = 0. 
This expression is much more accurate and efficient than the previous one, as it fulfills 
the so called "zero- variance principle": if the WF \1/ coincides with the exact ground 
state for arbitrary atomic position R^, it is easy to realize that the estimator of the 
forces does not have statistical fluctuations, as the local energy and its derivative are 
just constant, and independent of the real space configuration x. 
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3. Differential SWCT. The properties above are clearly fulfilled also in this case. More- 
over, the SWCT does not change the mean value of the forces, but affects only their 
statistical fluctuations. In fact, as mentioned before, the SWCT allows us to obtain 
a zero- variance property for the forces acting on isolated atoms. As a result, this 
transformation is extremely important for computing forces between atoms at large 
distance, as in this limit they can be considered isolated. 

The advantage to use the SWCT for a water dimer molecule is illustrated in Tab. [H It is 
clear that without the SWCT , or even without the differentiation of the local energy with 
respect to the atomic position (no-SWCT case), the evaluation of forces with a reasonable 
statistical error is simply not possible. 

As discussed in the following Section, all partial derivatives involved in the expressions 
above, namely the QN+QM components, -§^El, ^ log -^-El, log can be evaluated 
very efficiently with algorithmic differentiation. This is true also when the WF and the 
expression for the local energy are extremely cumbersome, e.g., when using pseudopotentials. 
As a result, the quantities in Eqs. fll5|16p can be evaluated using just a minor computational 
effort with roughly oc NM operations. In particular, one of the most involved contribution 

N 

to the local energy is the one corresponding to the bare kinetic energy K = — ^ ^ Aj. The 

i=l 

Hamiltonian H in Eq. f|T3|) always contains this term, even in presence of pseudopotentials. 
In the following, we will discuss how to differentiate the contribution K = 
to the local energy for the particularly simple but instructive case of a Slater determinant 
WF with no Jastrow factor. 



IV. ADJOINT ALGORITHMIC DIFFERENTIATION 



Algorithmic differentiation (AD) 13| is a set of programming techniques for the efficient 
calculation of the derivatives of functions implemented as computer programs. The main idea 
underlying these techniques is the fact that any such function - no matter how complicated 
- can be interpreted as the composition of more elementary functions each of which is in turn 
a composition of basic arithmetic and intrinsic operations that are easy to differentiate. As 
a result, it is possible to calculate the derivatives of the outputs of a program with respect 
to its inputs by applying mechanically the rules of differentiation - and in particular the 
chain rule - to the composition of its constituent functions. 



What makes AD particularly attractive, when compared to standard (finite-difference) 
methods for the calculation of derivatives is its computational efficiency. In fact, AD exploits 
the information on the calculations performed by the computer code, and the dependencies 
between its various parts, in order to optimize the calculation. In particular, when one re- 
quires the derivatives of a small number of outputs with respect to a large number of inputs, 
the calculation can be highly optimized by applying the chain rule through the instruc- 
tions of the program in opposite order with respect to the one of evaluation of the original 
instructions. This gives rise to the so called adjoint (mode of) algorithmic differentiation 
(AAD). 

Even if AD has been an active branch of computer science for several decades, its impact in 



other research fields has been surprisingly limited until very recently 



141 ]. Only over the past 



two years its tremendous effectiveness in speeding up the calculation of sensitivities e.g., in 



Monte Carlo simulations, has been first exploited in computational finance applications |15 |. 
In particular, the potential of AD has been largely left untapped in the field of computational 
physics where, as we demonstrate in the following, it could move significantly the boundary 
of what can be studied numerically with the computer power presently available. 



Griewank 13| contains a detailed discussion of the computational cost of AAD 16|. Here, 
we will only recall the main ideas underlying this technique to clarify how it can be beneficial 
in the implementation of the calculation of the forces in QMC To this end, we consider a 
particular computer implemented function X 

Y = FUNCTION(X) (17) 

mapping a vector X in in a vector Y in through a sequence of two sequential steps 

X ^ U ^ V ^ Y. 

Here, each step can be a distinct high-level function, or even an individual instruction in a 
computer code. A general code is usually implemented by several steps of this type, and, 
more importantly, the output of a particular instance can be used as an input not only 
for the next one but generally for all instances occurring later in the algorithm. Generally 
speaking an algorithm can be viewed as a sequential tree or graph with connectivity larger 
than one, where each node has more than one children. However, to keep things as simple 



as possible, in this "warm up" example we do not consider these more comp 
generalization to a realistic computational graph is however straightforward 
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ex cases. The 



16|. 



The adjoint mode of AD results from propagating the derivatives of the final result with 
respect to all the intermediate variables - the so called adjoints - until the derivatives with 
respect to the independent variables are formed. Using the standard AD notation, the 
adjoint V of any input variable V of an instance V Y is defined as the derivative of a 

given linear combination of the output ^ Y,!^- with respect to the input V, namely: 

j 

where F is a given input vector in M™. In particular, for each of the intermediate variables, 
using the chain rule, we get, 

dY^^-dY^dV^m 
'dX, 'dVkdUidX, 

where repeated indices indicate implicit summations. It is simple to realize that in such a 
simple case we can use the definition in Eq. ( ITS]) to evaluate X, namely: 

^9Y^^^dv,m^-m^^ 

'dX, ^dUidXi ^dXi 

In other words, once all adjoint instances have been defined, the bar input of each adjoint 
instance can be obtained from the output of the previous adjoint instance according to a 
diagram that follows very straightforwardly the original algorithm in reversed sequential 
order: 

Y ^ X . (19) 

In this way we obtain X, i.e., the linear combination of the columns of the Jacobian of 
the function X -^Y , with weights given by the input Y (e.g., 1, 0, . . . , 0), namely, 

A-.^fy}!!, (20) 

i=i 

with z = 1, . . . , n. 

In the adjoint mode, the cost does not increase with the number of inputs, but it is 
linear in the number of (linear combinations of the) columns of the Jacobian that need to be 
evaluated independently. In particular, if the full Jacobian is required, one needs to repeat 
the adjoint calculation m times, setting the vector Y equal to each of the elements of the 
canonical basis in W^. Furthermore, since the partial derivatives depend on the values of the 
intermediate variables, one generally first has to compute the original calculation storing the 
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values of all of the intermediate variables such as U and V, before performing the adjoint 
mode sensitivity calculation. 

One particularly important theoretical result is that given a computer code performing 
some high-level function f[T7|) . the execution time of its adjoint counterpart 

X = FUNCTION_B(X, Y) (21) 

(with suffix _B for "backward") calculating the linear combination ( 12U]) is bounded by ap- 
proximatively 4 times the cost of execution of the original one. Thus, one can obtain the 
sensitivity of a single output, or of a linear combination of outputs, to an unlimited number 
of inputs for little more work than the original computation. 

The propagation of the adjoints, being mechanical in nature can be automated. Indeed, 
several AD tools [U] are available that, given a function of the form ( I17p . generate the adjoint 
function fl^T]) . While the application of such automatic AD tools to large inhomogeneous 
simulation software is challenging, the principles of AD can be used as a programming 
paradigm of any algorithm. This is especially useful for the most common situations where 
simulation codes use a variety of libraries written in different languages, possibly linked 
dynamically. However, automatic tools are of great utility to generate the adjoint of self 
contained functions and subroutines thus effectively reducing the development time of adjoint 
implement at ions . 

A detailed tutorial on the programming techniques that are useful for adjoint implementa- 
tions is beyond the scope of this paper. However, when hand-coding the adjoint counterpart 
of a set of instructions in a general algorithm it is often enough to keep in mind just a few 
practical recipes, for instance: 

i) As previously mentioned, each intermediate different iable variable U can be used not 
only by the subsequent instance but also by several others occurring later in the 
program. As a result, the adjoint of U has in general several contributions, one for 
each instruction of the original function in which U was on the right hand side of 
the equal sign (assignment operator). Hence, by exploiting the linearity of differential 
operators, it is in general easier to program according to a syntactic paradigm in which 
adjoints are always updated so that the adjoint of an instruction of the form 



V = V{U) 
12 



reads 



oUi 



Clearly, this implies that the adjoints have to be appropriately initialized as discussed 
in the following paragraphs. In particular, to cope with input variables that are 
changed by the algorithm (see next point) it is generally best to initialize to zero the 
adjoint of a given variable in correspondence of the instruction in which it picks its 
first contribution (i.e., right before the adjoint corresponding to the last instruction of 
the original code in which the variable was to the right of the assignment operator). 
For instance, the adjoint of the following sequence of instructions 

y = F{x) 
z = H{x,y) 
X = G{z) 

can be written as: 

z = 

dG(z) 

z = z+ 

oz 

y = 

X = 

_ dH{x,y)_ 

X ^ X -\ z 

ox 

_ _ , dH{x,y) _ 

y = y + — 7^ z 

dy 

dFix) _ 

X = X H — y . 

ox 

Note in particular that the symbol x represents the input and the output of the 
algorithm. As a result, also x represents both the input and the output of the adjoint 
algorithm and it is crucial to reinitialize to zero x before it picks its first contribution 
from an adjoint statement, i.e., the one associated with the instruction z = H{x,y). 
As explained in detail in the following example, the algorithm could be more easily 
understood by replacing the last statement by another independent output variable 
u — G{z), and following the straightforward derivation of the adjoint algorithm that 
has for input u and output x (see also the example below) . One can easily derive that 
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the resulting algorithm coincides with the one above, namely the same input provides 
the same output. 

ii) In some situations the input C/ of a function V — V{U) is modified by the function 
itself. This situation is easily analyzed by introducing an auxiliary variable U' repre- 
senting the value of the input after the function evaluation. As a result, the original 
function can be thought of the form {V, U') = {V{U), U'{U)), where V{U) and U'{U) 
do not mutate their inputs, in combination with the assignment U = U', overwriting 
the original input U. The adjoint of this pair of instructions clearly reads 

u: = 

where we have used the fact that the auxiliary variable U' is not used elsewhere (so 
Ul does not have any previous contribution), and 

Ui = 

where, again, we have used the fact that also the original input U is not used after the 
instruction V — V{U), as it gets overwritten. One can therefore eliminate altogether 
the adjoint of the auxiliary variable U' and simply write 

Very common examples of this situation are given by increments of the form 

Ui = aUi + b 

with a and b constant with respect to U. According to the recipe above, the adjoint 
counterpart of this instruction simply reads 

These situations are common in iterative loops where a number of variables are typi- 
cally updated at each iteration. 



14 



In order to better illustrate these ideas, here we consider the calculation of the kinetic en- 
ergy and of its adjoint counterpart, where for simplicity we consider only one spin component 
(maximum polarized the calculation of both spin up and spin down contributions 

can be obtained just by summing them. Given the position of the electrons x and of the 
ions R, the calculation of the kinetic energy can be performed according to the following 
steps, as derived in App. \M 

1. Calculate Aij = ipiivj) and Bi j = Ajipi{rj) according to the definition of the molecular 
orbitals in Eq. ([T]). 

2. Calculate Ay} by matrix inversion. 

3. Calculate the kinetic energy as 

^ = -^E^^/^^> (22) 

The corresponding adjoint algorithm can be constructed by associating to each of the steps 
above its adjoint counterpart according to the correspondence given by Eqs. (fTTl) . (!20|) . 
and ( pTl) . As a result, as also illustrated schematically in Fig. [H the adjoint algorithm for 
the derivatives of the kinetic energy with respect to the positions of the electrons and ions, 
consists of steps 1-3 above, and their adjoint counterparts executed in reverse order, namely: 

3. Set K = 1, and evaluate the adjoint of the function {A^j, Bj^i) — )• K defined in step 3. 
This is a function of the form {Ayj,Bij,K) {A~j,Bij) with A^ j = -\KBj^i and 
B., = -IKAJI 

2. Evaluate the adjoint of the function Aij — )■ A~j (step 2), namely, (Aij,A~j) — )■ Aij 
with A = -{A-YA-\A-Y (see App. E]). 

1. Evaluate the adjoint of the function {x,R) — )■ {Aij,Bij) (step 1), namely, 
{x, R, Aij, Bij) — i- {x,R) with 

dK 

^ i 

dK 

Ra = = [AjdKa'^iirj) + B,jdB_^Aj^i{rj)] 

" hi 

= Yxi,kSii^:,R^[Aijdn^:M^j) + B^,j^n^Aj(f)k{rj)] , 
15 



where in the latter equahty we have expanded the orbitals in terms of atomic orbitals, 
by means of Eq. ([1]) . 

Notice that in the last expression it is the presence of the Kronecker delta, that allows the 
computation of all the derivatives with respect to the atomic positions in ~ 2N'^L oper- 
ations, namely the same amount of operations used in the forward step. Indeed by summing 
only once over the three indices i,j, k in the above expression all the force components acting 
on all the atoms are obtained. 

In AAD this is not accidental, and the structure of the algorithm is automatically opti- 
mized for computing several derivatives at the cheapest computational cost. 

By applying the chain rule it is immediate to see that x and R computed according to 
the steps above are the derivatives of the kinetic energy with respect to the position of the 
electrons and the ions, respectively. It is also easy to realize that - as ex pec ted according to 
general results on the computational complexity of adjoint algorithms 13|] quoted above - 
the number of operations involved in each adjoint step is a small constant times the number 
of operations of the original step, namely (considering only multiplications) 2N'^L vs N'^L, 
2N^ vs A^^, and 2N'^ vs A^^ for steps 1 vs 1, 2 vs 2, and 3 vs 3, respectively. 

As also anticipated, the propagations of the adjoints (steps 3 — 1) can be performed only 
after the calculation of the kinetic energy has been completed (steps 1 — 3) and some of the 
intermediate results (e.g., the matrices A, B, and A~^) have been computed and stored. This 
is the reason why, in general, the adjoint of a given function generally contains a forward 
sweep, reproducing the steps of the original function, plus a backward sweep, propagating the 
adjoints. This construction can be clearly applied recursively for each of the steps involved 
in the calculation. 

It is worth noting that each adjoint step, taken in isolation, contains in turn a forward 
sweep, recovering the information computed in the original step that is necessary for the 
propagation of the adjoints. However, this can be clearly avoided by storing such informa- 
tion at the time it is first computed in the original step. Strictly speaking, this is necessary 
to ensure that the computational cost of the overall algorithm remains within the expected 
bounds. However, there is clearly a tradeoff between the time necessary to store and retrieve 
this information and the time to recalculate it from scratch, so that in practice it is often 
enough to store in the main forward sweep only the results of relatively expensive compu- 
tations. In the example above for instance, significant savings can be obtained by storing 
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the inverse of the matrix A at the output of step 2 and passing it as an input of Step 2 (see 
Fig. [ID . 

The main comphcation in the algorithm above is the implementation of the adjoint of 
the Laplacian of the WF in step 1. However, the calculation of the Laplacian is a good 
example of an instance that can be represented by a self contained, albeit complex, computer 
function, for which several automatic differentiation tools are available. In particular, in 
order to complete step 1, it is enough to define the adjoint functions of the calculation of 
the Laplacian r — )■ A0j(r), for a set of explicit functions {(pj} (e.g. gaussians). The adjoints 
are then computed by means of the corresponding gradient of the Laplacian, namely — > f 



where r = r + ?/'V^?A0j(r). For this a£ 



ahcation we have used TAPENADE, developed at 



INRIA by Hascoet and collaborators 17|. 



V. RESULTS 



After implementing the adjoint counterpart of the two main instances corresponding to 
the evaluation of the log WF and the local energy, we have computed the exact energy 
derivatives and compared with the straightforward finite-difference evaluation, finding per- 
fect agreement within numerical accuracy. However, the finite-difference method presents a 
well known bottleneck: in order to evaluate the 3M energy derivatives, one has to evaluate 
the local energy and the log WF at least 3M times more. Since the computation of such 
quantities is the most relevant part in QMC, with a computational effort scaling as N^, we 
end up with a very inefficient algorithm for large number of atoms. As shown in Fig. [2l 
this slowing down can be completely removed by using A AD, as the cost to compute all the 
force components in a system containing several water molecules, remains approximately 4 
times larger than the cost to compute only the total energy. This factor 4 is a very small 
cost, if we consider that the main adjoint instance has to be evaluated twice, one for the 
local energy and the other for the WF logarithm, and that, on the other hand, VMC is the 
fastest method in QMC. For instance, we can evaluate forces within LRDMC with only a 
small overhead, as the cost to generate a new independent configuration within LRDMC is 
about 10 times larger than VMC, and therefore, for this more accurate method, the cost to 
compute all force components will be essentially negligible. Analogous consideration holds 
during an energy optimization. We have to consider that in this case AAD can be used to 
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compute not only the force components, but also all the energy derivatives with respect to 
all variational parameters {q} of the WF, essentially at the same computational cost, even 
when the number p of variational parameters is extremely large. 

Though we have not implemented AAD for this general task, we expect a further speed 
up (and simplification) of the code, once AAD will be fully implemented for all possible 
energy derivatives. We believe this will become common practice for future quantum Monte 
Carlo packages. At present, in order to have consistent forces within VMC, all variational 
parameters have to be optimized and to this purpose we have used the standard way 
to compute energy derivatives. 

We have applied the efficient evaluation of the forces for the structural optimization of the 



19| only for the oxygen 



water monomer. We have used energy-consistent pseudopotentials 
atom. In the calculation we have adopted a huge basis set to avoid basis superposition 
errors. The molecular orbitals are expanded in a primitive basis containing 24s22pl0d6flg 
on the oxygen and 6s5pld on the hydrogen atom. The exponents of the Gaussians are 
optimized by minimizing the energy of a self-consistent DFT calculation within the LDA 
approximation j^]. The accuracy in the total DFT energy is well below ImHa for the water 
dimer, implying that we are essentially working with an almost complete basis set. For the 
Jastrow factor we have also used a quite large basis, to achieve similar accuracy in the total 
energy, within a VMC calculation on a WF obtained by optimizing the Jastrow over the 
LDA Slater determinant. The final optimized basis for the Jastrow contains a contracted 
basis 6s5p2d/3s3pld on the oxygen and an uncontracted Islp basis on the hydrogen atom. 

In the following we describe the first application of this method for optimizing the struc- 
ture of simple water compounds. The variational parameters of the WF - molecular orbitals 
and Jastrow factor - are optimized, by energy minimization, with the method described in 
Ref. y. At each step of optimization, we compute the ionic forces by AAD, and we employ 
a standard steepest descent move of the ions Ra R^: 

Rl = R, + ArF, (23) 

where Ar = l/2a.u.. After several hundred iterations both the variational parameters 
and the atomic positions fiuctuate around average values, and we use the last few hundred 
iterations to evaluate the error bars and the mean value of the atomic positions, as illustrated 
in Fig. [31 
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In Tab. [TTlwe show the optimized structure of the water monomer. As it is clearly evident 
our final atomic positions are almost indistinguishable from the experimental ones. Generally 
speaking our calculation appears more accurate than simple mean field DFT methods, and 
comparable with state of the art quantum chemistry techniques, such as CCSD(T). The 
accuracy of the VMC method has been also confirmed recently in another context. 20| 

In the dimer structure the situation is slightly different. As shown in Tab. lIIIt the oxygen- 
oxygen distance is in quite good agreement with experiments, whereas the OHO angle is 
overestimated by few degrees. Probably in this case the quantum corrections should affect 
the hydrogen position between the two oxygens, because the dimer bond is very weak. Indeed 
we have also checked that, with the more accurate LRDMC calculation, the equilibrium 
structure obtained by the VMC method remains stable as all the force components are well 
below l{]~^a.u.. On the other hand LRDMC increases the binding of the dimer by about 
IKcal/mol, showing that, from the energetic point of view, the LRDMC calculation may 
be important, as also confirmed in previous studies . All the above calculations can 

be done with a relatively small computational effort (few hours in a 32 processor parallel 
computer), and therefore the same type of calculation, with the same level of accuracy, can 
be extended to much larger systems containing several atoms with modern supercomputers. 

Stimulated by the above success we have tested the finite-temperature molecular dynam- 
ics simulation introduced some time ago using 4 water molecules in a cubic box with 
4. 93 A side length, mimicking the density of liquid water at ambient conditions. Since we 
are interested in static equilibrium properties we have used for the oxygen the same mass of 
hydrogen. Though the system is very small we have been able to perform several thousands 
steps. For each step all variational parameters are optimized using a given number n of 
stochastic reconfiguration (SR) optimizations P]. For the first 18000 steps we used n = 1 
and a time integration step At for the MD ranging from 20a. m. or 40a. m.. Several iterations 
were possible because in QMC we can decide to work with a relatively small number of 
samples to accumulate statistics for the energy derivatives and the forces. In these condi- 

n 

tions the forces are rather noisy but the molecular dynamics with noise correction p| allows 
us to have sensible results, at the price to have an overdamped dynamics. However, as it 
is shown in Fig. HJ it is difficult to remain within the Born-Oppheneimer energy surface, 
because at selected times, we have fully optimized the wave-function using further 200 SR 
iterations, and found 6mHa difference between the energy on fly and the optimized energy. 
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In order to overcome this bias in the dynamics, in the final part of the MD simulation, we 
have used n — 10 (and At — AOa.u.) and found that we remain sufficiently close to the 
Born-Oppheneimer energy surface. This very preliminary application is clearly limited by 
the too small number of water molecules considered in the simulation, and therefore docs not 
allow us to determine the equilibrium properties of liquid water. Nevertheless, we believe 
that this result is rather encouraging because it shows that all the possible sources of errors 
in the MD driven by QMC forces, can be controlled in a rather straightforward way. 

VI. CONCLUSIONS 

In this work we have shown that the calculation of all the force components in an elec- 
tronic system containing several atoms, can be done very efficiently using adjoint algorithmic 
differentiation (AAD). In particular it is possible to employ the very efficient space warp 
coordinate transformation (SWCT) in differential form in a straightforward and simple way, 
even when pseudopotentials and/or complicated many-body wave functions are used. More 
importantly, we have shown that, using AAD, one can compute all these force compo- 
nents, and in principle all the energy derivatives with respect to any variational parameter 
contained in the many-body wave function, in about four times the cost to compute the 
expectation value of the energy. 

So far, for large number of atoms, the use of quantum Monte Carlo methods have been 
generally hmited to total energy calculations. We beheve that our work opens the way for 
new and more accurate tools for ab-initio electronic simulation based on quantum Monte 
Carlo. In particular we have shown that it is possible to perform an ab-initio molecular 
dynamics simulation for several picoseconds, in a system containing four water molecules. 
Since the cost of a variational Monte Carlo calculation with fixed statistical accuracy in 
the energy per atom (total energy) increases with the number of atoms as (M^) the 
simulation of about 32 water molecules should be possible with less than 10^ (10^) CPU 
hours, a figure that is nowadays possible (at the limits of present possibilities) with modern 
massively parallel supercomputers. It is not known at present if it is sufficient to target a 
fixed statistical error in the energy per atom in order to obtain well-converged thermody- 
namic extensive quantities. Otherwise a computationally more expensive calculation with 
a statistical error on the total energy of the order of kT is necessary, as in the penalty 
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method 221. 



In the example we have presented, we have also seen that the accuracy of variational 
Monte Carlo in determining the equilibrium structure of the water monomer and the water 
dimer is rather remarkable and comparable to post Hartree-Fock methods, requiring much 
more computer resources for large number of atoms. Therefore we believe that, in view 
of the efficiency in the evaluation of forces obtained by AAD, realistic and very accurate 
ab-initio simulation based on quantum Monte Carlo will be within reach in the near future. 
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Appendix A: Calculation of the Laplacian 

In order to compute the Laplacian of a Slater determinant WF: 

{x\S) = detA (Al) 
where A is the N x N matrix defined by: 

A,j = M^i) (A2) 

and the molecular orbitals ipi are defined in Eq. ([T]) and the spin index is omitted for 
simplicity. 

We notice that if we change the electron position of the k^^ electron the matrix A change 
only by a single column: 

A[^ = A, J + [iPiiri) - Ai^k] (A3) 

In this way we can single out the dependence on r'/^ by evaluating explicitly the ratio of the 
two WF's: 

^ = detA-A' = (A4) 

j 

The evaluation of the Laplacian with respect to the k*^ electrons easily follows by linearity 
of differential operations such as the Laplacian, so that we finally arrive to Eq. (!22|) . by 
summing over all the electrons. 
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Appendix B: Calculation of the adjoint of an inverse matrix operation 

In this appendix we derive the adjoint instance of the calculation of the inverse A~j^ of 
an input N x N square matrix Aij. To this purpose we notice that an arbitrary linear 
combination of the output (the inverse matrix) can be written as: 

E^J^^/=Tr[(^-^r^-^] (Bl) 
hi 

where the subscript T indicate the transpose of the corresponding matrix, and Tr the conven- 
tional matrix trace (sum of the diagonal elements). As explained in Sec. lIVl by differentiating 
the above equation with respect to an arbitrary variation of the input we obtain the adjoint 
instance. To this purpose we denote the differential change of the input A with the matrix 
DA, so that the corresponding variation can be conveniently written as: 

{A + DA) = A{I + A-^DA) 

and therefore : 

{A + DA)-^ = (/ - A-^DA)A-^ + 0(|DA|2). 

Thus, by simply substituting the above relation in Eq. (IB1|) . and by using the cyclic invari- 
ance of the trace, we obtain: 

dTt [{A-YA-^] = -Tr [A^\A-Y)A-^DA] (B2) 

which implies that the adjoint matrix A, after this instance, is updated as follows: 

A = A+ "^^^ ^^^^"^^^^"'^ = ^ - [A-\A~T)A~'f = A- {A-YA~\A-Y (B3) 
which concludes this appendix. 
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TABLE I: Efficiency of the various types of evaluation of forces in VMC for a water dimer 
molecule at experimental equilibrium atomic positions. The computational time to calculate the 
force components on the Oxygen atoms within a given statistical error is inversely proportional to 
the efficiency. The HFM efficiency is taken for reference equal to one, for a statistical accuracy of 
O.OOla.u.. Pseudopotential is used only for the Oxygen. 



Method 


HFM 


No-SWCT 


SWCT 


Efficiency 


1 


165 


1360 
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TABLE II: VMC optimized structure of the water monomer. 





Exp 


VMC 


LDA^ 


BLYP^ 


gpa 


CCSD(T) ^ 


doH{A) 


0.957^ 


0.954(1) 


0.973 


0.973 


0.974 


0.95829 


ZHOH (deg) 


104.5'^ 


104.61(10) 


104.4 


104.6 


104.1 


104.454 



From Ref. 23 1 
^ From Ref. 24 [ 
From Ref. 25 1 
From Ref. [261 
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TABLE III: VMC optimized structure of the water dimer. The LRDMC calculation 
was done only for the binding energy, by projecting the VMC WF. 





Exp 


VMC 


LRDMC 


LDA^ 


BLYP'^ 


PBE^ 


CCSD(T) " 


doo{A) 


2.98'^ 


2.969(2) 




2.70 


2.95 


2.89 


2.9089 


ZOHO (deg) 


174^ 


177.6(3) 




169 


173 


172 




binding (kCal/mol) 


5.0(7) f 


3.84(14) 


4.76(6) 


8.8 


4.3 


5.55 





^ From Ref. 


23 


^ From Ref. 
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^ From Ref. 
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From Ref. 


29 


^ From Ref. 
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f From Ref. 
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List of figures 



• Figure 1: Sliematic representation of the adjoint algorithm for the calculation of the 
local kinetic energy. Note that the inverse of the matrix A can be passed directly as 
an input (dotted arrow) of step 2 thus avoiding to repeat the matrix inversion. 

• Figure 2: Ratio of CPU time required to compute energies and all force components 
referenced to the one required for the simple energy calculation within VMC. The 
calculations refer to 1,2,4, and 32 water molecules. The inset is an expansion of the 
lower part of the plot. 

• Figure 3: oxygen-oxygen distance as a function of the number of iterations for de- 
termining the equilibrium zero-temperature structure of the water dimer. All the 18 
atomic coordinates, as well as about 1000 variational parameters of the electronic 
many-body WF are fully optimized with an iterative scheme 0, Q]. 

• Figure 4: Average internal energy as a function of the simulation time, for the molec- 
ular dynamics with QMC forces [1|] and four water molecules in a cubic box. Empty 
dots indicate VMC energy values corresponding to the time evolved WF with much 
smaller error bars (< OAmHa). Empty squares are obtained after optimizing the 
WF at fixed atomic positions, starting from the previous initial state and with quite 
accurate statistical accuracy (< OAmHa). 
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FIG. 1: S. Sorella and L. Capriotti, Journal of Chemical Physics 
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FIG. 4: S. Sorella and L. Capriotti, Journal of Chemical Physics 
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